Lecture 24

Virtual Memory

## Virtual Memory

- Some facts of computer life...
  - Computers run lots of processes simultaneously
  - No full address space of memory for each process
    - Physical memory expensive and not dense thus, too small
  - Must share smaller amounts of physical memory among many processes
- Virtual memory is the answer!
  - Divides physical memory into blocks, assigns them to different processes
    - Compiler assigns data to a "virtual" address.
      - VA translated to a real/physical somewhere in memory
    - Allows program to run anywhere; where is determined by a particular machine, OS
      - + Business: common SW on wide product line (w/o VM, sensitive to actual physical memory size)

#### Lecture 24 - Architectural Support for Virtual Memory

<u>Virtual address space</u> greater than <u>Logical address space</u>



#### Lecture 24 - Architectural Support for Virtual Memory

## The gist of virtual memory

- Relieves problem of making a program that was too large to fit in physical memory – well...fit!
- Allows program to run in any location in physical memory
  - Really useful as you might want to run same program on lots machines...



Logical program is in contiguous VA space; here, pages: A, B, C, D; (3 are in main memory and 1 is located on the disk)

## Some definitions and cache comparisons

#### • The bad news:

- In order to understand exactly how virtual memory works, we need to define some terms
- The good news:
  - Virtual memory is very similar to a cache structure
- So, some definitions/"analogies"
  - A "page" or "segment" of memory is analogous to a "block" in a cache
  - A "page fault" or "address fault" is analogous to a cache miss

so, if we go to main memory and our data

isn't there, we need to get it from disk ...

Lecture 24 - Architectural Support for Virtual Memory

## System Maps VA To PA (VPN to PFN)

key word in that sentence? "system"

- individual processes do not perform mapping
- same VPNs in different processes map to different PFNs
- + protection: processes cannot use each other's PAs
- + programming made easier: each process thinks it is alone
- + *relocation*: program can be run anywhere in memory doesn't have to be physically contiguous
  - · can be paged out, paged back in to a different physical location

"system": something user process can't directly use via ISA

OS or purely microarchitectural part of processor

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory

26

"real"/physical

memory

## Virtual Memory: The Story



© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith Vijaykumar, Lipasti

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory

25

Lecture 24 - Architectural Support for Virtual Memory

## Virtual Memory: The Four Questions

same four questions, different four answers page placement: fully (or very highly) associative whv? Might think page identification: address translation about these 2

will discuss soon

- page replacement: sophisticated (LRU + "working set") whv?
- write strategy: always write-back + write-allocate why?

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

simultaneously

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory

27

## The Answer Behind the Four Answers

backing store to main memory is disk

- memory is 50 to 100 slower than processor
- disk is 20 to 100 *thousand* times slower than memory
   disk is 1 to 10 *million* times slower than processor
- a VA miss (VPN has no PFN) is called a page fault
  - high cost of page fault determines design
  - full associativity + OS replacement ⇒ reduce miss rate
     have time to let software get involved, make better decisions
  - write-back reduces disk traffic
  - page size usually large (4KB to 16KB) to amortize reads

| © 2004 by Lebeck, Sorin, Roth, |  |
|--------------------------------|--|
|                                |  |
| Hill, Wood, Sohi, Smith,       |  |
| Vijavkumar Linasti             |  |
|                                |  |

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory 28

Lecture 24 - Architectural Support for Virtual Memory

## Virtual Memory

- Timing's tough with virtual memory:
  - $-AMAT = T_{mem} + (1-h) * T_{disk}$ - = 100nS + (1-h) \* 25,000,000nS
- h (hit rate) had to be <u>incredibly</u> (almost unattainably) close to perfect to work
- so: VM is a "cache" but an odd one.

## **Compare Levels of Memory Hierarchy**



Lecture 24 - Architectural Support for Virtual Memory

## Introduction to page translation (on the board)

## **Test Yourself**

A processor asks for the contents of virtual memory address  $0 \times 10020$ . The paging scheme in use breaks this into a VPN of  $0 \times 10$  and an offset of  $0 \times 020$ .

PTR (a CPU register that holds the address of the page table) has a value of 0x100 indicating that this processes page table starts at location 0x100.

The machine uses word addressing and the page table entries are each one word long.

|--|

|                  | VPN   | OFFSET |
|------------------|-------|--------|
| Memory Reference | 0x010 | 0x020  |

| Test Yourse |
|-------------|
|-------------|

| ADDR    | CONTENTS |
|---------|----------|
| 0x00000 | 0x00000  |
| 0x00100 | 0x00010  |
| 0x00110 | 0x00022  |
| 0x00120 | 0x00045  |
| 0x00130 | 0x00078  |
| 0x00145 | 0x00010  |
| 0x10000 | 0x03333  |
| 0x10020 | 0x04444  |
| 0x22000 | 0x01111  |
| 0x22020 | 0x02222  |
| 0x45000 | 0x05555  |
| 0x45020 | 0x06666  |
|         |          |

Vijaykumar, Lipasti

| PTR  | 0x100  |        |       |        |
|------|--------|--------|-------|--------|
|      |        |        | VPN   | OFFSET |
| Memo | ry Ref | erence | 0x010 | 0x020  |

- What is the physical address calculated?
- 1. 10020
- 2. 22020
- 3. 45000
- 4. 45020
- 5. none of the above

Lecture 24 - Architectural Support for Virtual Memory

## Test Yourself

| ADDR    | CONTENTS |
|---------|----------|
| 0x00000 | 0x00000  |
| 0x00100 | 0x00010  |
| 0x00110 | 0x00022  |
| 0x00120 | 0x00045  |
| 0x00130 | 0x00078  |
| 0x00145 | 0x00010  |
| 0x10000 | 0x03333  |
| 0x10020 | 0x04444  |
| 0x22000 | 0x01111  |
| 0x22020 | 0x02222  |
| 0x45000 | 0x05555  |
| 0x45020 | 0x06666  |
|         |          |

| PTR  | 0x100 | )       |       |        |
|------|-------|---------|-------|--------|
|      |       |         | VPN   | OFFSET |
| Memo | ry Re | ference | 0x010 | 0x020  |

- What is the physical address calculated?
- What is the contents of this address returned to the processor?
- How many memory accesses in total were required to obtain the contents of the desired address?

#### Lecture 24 - Architectural Support for Virtual Memory

## Address Translation: Page Tables

OS performs address translation using a *page table* 

- · each process has its own page table
  - OS knows address of each process' page table
- a page table is an array of page table entries (PTEs)
  - one for each VPN of each process, indexed by VPN



## **Review:** Paging Hardware



## **Review:** Address Translation



#### Lecture 24 - Architectural Support for Virtual Memory

#### Lecture 24 - Architectural Support for Virtual Memory

5

6

1

2

101

110

001

010

#### Page Table Size

page table size

- example #1: 32-bit VA, 4KB pages, 4-byte PTE
  - 1M pages (32 bits = 4 GB address space / 4 KB page = 1M pages)
  - 1M pages\*4bytes = 4MB page table (bad, but could be worse)
- example #2: 64-bit VA, 4KB pages, 4-byte PTE 4P pages, 16PB page table (not a viable option)
- upshot: can't have page tables of this size in memory

techniques for reducing page table size

- multi-level page tables
- inverted page tables

© 2004 by Lebeck, Sorin, Roth, Hill, Wood, Sohi, Smith, Vijaykumar, Lipasti

COMPSCI 220 / ECE 252 Lecture Notes Storage Hierarchy II: Main Memory

31



| Physi | cal memory |
|-------|------------|
| 0     |            |
| 1     |            |
| 2     |            |
| 3     |            |
| 4     | i          |
| 5     | j          |
| 6     | k          |
| 7     | 1          |
| 8     | m          |
| 9     | n          |
| 10    | 0          |
| 11    | р          |
| 12    |            |
| 13    |            |
| 14    |            |
| 15    |            |
| 16    |            |
| 17    |            |
| 18    |            |
| 19    |            |
| 20    | a          |
| 21    | b          |
| 22    | c          |
| 23    | d          |
| 24    | е          |
| 25    | f          |
| 26    | g          |
| 27    | h          |
| 28    |            |
| 29    |            |
| 30    |            |
| 31    |            |

## Block replacement

- Which block should be replaced on a virtual memory miss?
  - Again, we'll stick with the strategy that it's a good thing to eliminate page faults
  - Therefore, we want to replace the LRU block
    - Many machines use a "use" or "reference" bit
    - Periodically reset
    - $\cdot$  Gives the OS an estimation of which pages are referenced

## Writing a block

- What happens on a write?
  - We don't even want to think about a write through policy!
    - Time with accesses, VM, hard disk, etc. is so great that this is not practical
  - Instead, a write back policy is used with a dirty bit to tell if a block has been written

Lecture 24 - Architectural Support for Virtual Memory

#### Lecture 24 - Architectural Support for Virtual Memory

## Page tables and lookups...

- 1. it's slow! We've turned every access to memory into two accesses to memory
  - solution: add a specialized "cache" called a "translation lookaside buffer (TLB)" inside the processor
  - punt this issue for a lecture (until Thursday)
- 2. it's still huge!
  - even worse: we're ultimately going to have a page table for every *process*. Suppose 1024 processes, that's 4GB of page tables!

## Introduction to TLBs

# Operating System Physical Memory Disk





Lecture 24 - Architectural Support for Virtual Memory

## Let's talk more about TLBs on the board

#### Lecture 24 - Architectural Support for Virtual Memory

## **Review:** Translation Cache

- Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped
- TLBs are usually small, typically not more than 128 256 entries even or high end machines. This permits fully associative lookup on these machines. Most mid-range machines use small n-way set associative organizations.
- Note: 128-256 entries times 4KB-16KB/entry is only 512KB-4MB the L2 cache is often bigger than the "span" of the TLB.



Lecture 24 - Architectural Support for Virtual Memory

## An example of a TLB



Lecture 24 - Architectural Support for Virtual Memory

## **Review:** Translation Cache

A way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most frequently used is *Translation Lookaside Buffer* or *TLB* 

| Virtual Page #  | Physical Frame # | Dirty | Ref | Valid | Access |
|-----------------|------------------|-------|-----|-------|--------|
|                 |                  |       |     |       |        |
|                 |                  |       |     |       |        |
|                 |                  |       |     |       |        |
|                 |                  |       |     |       |        |
| $\underbrace{}$ |                  |       |     |       |        |
| tag             |                  |       |     |       |        |

Really just a cache (a special-purpose cache) on the page table mappings

TLB access time comparable to cache access time (much less than main memory access time)

Lecture 24 - Architectural Support for Virtual Memory

## The "big picture" and TLBs

- Address translation is usually on the critical path... - ...which determines the clock cycle time of the  $\mu$ P
- Even in the simplest cache, TLB values must be read and compared
- TLB is usually smaller and faster than the cacheaddress-tag memory
  - This way multiple TLB reads don't increase the cache hit time
- TLB accesses are usually pipelined b/c its so important!

## The "big picture" and TLBs

